Probit Regression with Correlated Label Noise: An EM-EP approach

نویسندگان

  • Stephan Mandt
  • Florian Wenzel
چکیده

Probit regression and logistic regression are well-known models for classification. In contrast to logistic regression, probit regression has a canonical generalization that allows us to model correlations between the labels. This is a way to include metadata into the model that correlate the noisy observation process. We show that the approach leads to the mathematical problem of integrating a high-dimensional Gaussian density over the positive orthant. We derive a novel parameter estimation algorithm for this correlated probit regression model. We interpret the noise as a latent variable, which leads to a natural formulation of our algorithm as an expectation-maximization (EM) scheme. Each partial M-step is a gradient step, and we can express the gradient in terms of moments of the truncated multivariate Gaussian. Calculating these moments the E-step is expensive using traditional methods. Instead, we use a recent application of expectation propagation (EP) to Gaussian densities. The resulting EM-EP scheme is much faster and thus allows us to treat large data sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Separating Sparse Signals from Correlated Noise in Binary Classification

Among the goals of statistical genetics is to find sparse associations of genetic data with binary phenotypes, such as heritable diseases. Often, the data are obfuscated by confounders such as age, ancestry, or population structure. A widely appreciated modeling paradigm which corrects for such confounding relies on linear mixed models. These are linear regression models with correlated noise, ...

متن کامل

An Effective Approach for Robust Metric Learning in the Presence of Label Noise

Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...

متن کامل

Robit Regression: A Simple Robust Alternative to Logistic and Probit Regression

Logistic and probit regression models are commonly used in practice to analyze binary response data, but the maximum likelihood estimators of these models are not robust to outliers. This paper considers a robit regression model, which replaces the normal distribution in the probit regression model with a t-distribution with a known or unknown number of degrees of freedom. It is shown that (i) ...

متن کامل

Nested Expectation Propagation for Gaussian Process Classification with a Multinomial Probit Likelihood

We consider probabilistic multinomial probit classification using Gaussian process (GP) priors. The challenges with the multiclass GP classification are the integration over the non-Gaussian posterior distribution, and the increase of the number of unknown latent variables as the number of target classes grows. Expectation propagation (EP) has proven to be a very accurate method for approximate...

متن کامل

Em Algorithm for Mle of a Probit Model for Multiple Ordinal Outcomes

The correlated probit model is frequently used for multiple ordered data since it allows to incorporate seamlessly different correlation structures. The estimation of the probit model parameters based on direct maximization of the limited information maximum likelihood is a numerically intensive procedure. We propose an extension of the EM algorithm for obtaining maximum likelihood estimates fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014